13 research outputs found
Deep visible and thermal image fusion for enhanced pedestrian visibility
Reliable vision in challenging illumination conditions is one of the crucial requirements of future autonomous automotive systems. In the last decade, thermal cameras have become more easily accessible to a larger number of researchers. This has resulted in numerous studies which confirmed the benefits of the thermal cameras in limited visibility conditions. In this paper, we propose a learning-based method for visible and thermal image fusion that focuses on generating fused images with high visual similarity to regular truecolor (red-green-blue or RGB) images, while introducing new informative details in pedestrian regions. The goal is to create natural, intuitive images that would be more informative than a regular RGB camera to a human driver in challenging visibility conditions. The main novelty of this paper is the idea to rely on two types of objective functions for optimization: a similarity metric between the RGB input and the fused output to achieve natural image appearance; and an auxiliary pedestrian detection error to help defining relevant features of the human appearance and blending them into the output. We train a convolutional neural network using image samples from variable conditions (day and night) so that the network learns the appearance of humans in the different modalities and creates more robust results applicable in realistic situations. Our experiments show that the visibility of pedestrians is noticeably improved especially in dark regions and at night. Compared to existing methods we can better learn context and define fusion rules that focus on the pedestrian appearance, while that is not guaranteed with methods that focus on low-level image quality metrics
Efficient training procedures for multi-spectral demosaicing
The simultaneous acquisition of multi-spectral images on a single sensor can be efficiently performed by single shot capture using a mutli-spectral filter array. This paper focused on the demosaicing of color and near-infrared bands and relied on a convolutional neural network (CNN). To train the deep learning model robustly and accurately, it is necessary to provide enough training data, with sufficient variability. We focused on the design of an efficient training procedure by discovering an optimal training dataset. We propose two data selection strategies, motivated by slightly different concepts. The general term that will be used for the proposed models trained using data selection is data selection-based multi-spectral demosaicing (DSMD). The first idea is clustering-based data selection (DSMD-C), with the goal to discover a representative subset with a high variance so as to train a robust model. The second is an adaptive-based data selection (DSMD-A), a self-guided approach that selects new data based on the current model accuracy. We performed a controlled experimental evaluation of the proposed training strategies and the results show that a careful selection of data does benefit the speed and accuracy of training. We are still able to achieve high reconstruction accuracy with a lightweight model
The effect of the color filter array layout choice on state-of-the-art demosaicing
Interpolation from a Color Filter Array (CFA) is the most common method for obtaining full color image data. Its success relies on the smart combination of a CFA and a demosaicing algorithm. Demosaicing on the one hand has been extensively studied. Algorithmic development in the past 20 years ranges from simple linear interpolation to modern neural-network-based (NN) approaches that encode the prior knowledge of millions of training images to fill in missing data in an inconspicious way. CFA design, on the other hand, is less well studied, although still recognized to strongly impact demosaicing performance. This is because demosaicing algorithms are typically limited to one particular CFA pattern, impeding straightforward CFA comparison. This is starting to change with newer classes of demosaicing that may be considered generic or CFA-agnostic. In this study, by comparing performance of two state-of-the-art generic algorithms, we evaluate the potential of modern CFA-demosaicing. We test the hypothesis that, with the increasing power of NN-based demosaicing, the influence of optimal CFA design on system performance decreases. This hypothesis is supported with the experimental results. Such a finding would herald the possibility of relaxing CFA requirements, providing more freedom in the CFA design choice and producing high-quality cameras
Weakly supervised deep learning method for vulnerable road user detection in FMCW radar
Millimeter-wave radar is currently the most effective automotive sensor capable of all-weather perception. In order to detect Vulnerable Road Users (VRUs) in cluttered radar data, it is necessary to model the time-frequency signal patterns of human motion, i.e. the micro-Doppler signature. In this paper we propose a spatio-temporal Convolutional Neural Network (CNN) capable of detecting VRUs in cluttered radar data. The main contribution is a weakly supervised training method which uses abundant, automatically generated labels from camera and lidar for training the model. The input to the network is a tensor of temporally concatenated range-azimuth-Doppler arrays, while the ground truth is an occupancy grid formed by objects detected jointly in-camera images and lidar. Lidar provides accurate ranging ground truth, while camera information helps distinguish between VRUs and background. Experimental evaluation shows that the CNN model has superior detection performance compared to classical techniques. Moreover, the model trained with imperfect, weak supervision labels outperforms the one trained with a limited number of perfect, hand-annotated labels. Finally, the proposed method has excellent scalability due to the low cost of automatic annotation
RGB-NIR demosaicing using deep residual U-Net
Multi-spectral image acquisition brings numerous potential benefits in computer vision and image processing applications. Single-sensor acquisition helps to overcome problems with misalignments occurring in multiple-sensor acquisition. However, the single-sensor approach poses the problem of interpolation of missing values. In this paper we propose an adapted version of a residual U-Net, with application in demosaicing. The experiments show that the proposed method achieves state-of-the-art results, and has good generalization capabilities to different color filter array patterns
HDR video synthesis for vision systems in dynamic scenes
High dynamic range (HDR) image generation from a number of differently exposed low dynamic range (LDR) images has been extensively explored in the past few decades, and as a result of these efforts a large number of HDR synthesis methods have been proposed. Since HDR images are synthesized by combining well-exposed regions of the input images, one of the main challenges is dealing with camera or object motion. In this paper we propose a method for the synthesis of HDR video from a single camera using multiple, differently exposed video frames, with circularly alternating exposure times. One of the potential applications of the system is in driver assistance systems and autonomous vehicles, involving significant camera and object movement, non- uniform and temporally varying illumination, and the requirement of real-time performance. To achieve these goals simultaneously, we propose a HDR synthesis approach based on weighted averaging of aligned radiance maps. The computational complexity of high-quality optical flow methods for motion compensation is still pro- hibitively high for real-time applications. Instead, we rely on more efficient global projective transformations to solve camera movement, while moving objects are detected by thresholding the differences between the trans- formed and brightness adapted images in the set. To attain temporal consistency of the camera motion in the consecutive HDR frames, the parameters of the perspective transformation are stabilized over time by means of computationally efficient temporal filtering. We evaluated our results on several reference HDR videos, on synthetic scenes, and using 14-bit raw images taken with a standard camera
Multi-focus image fusion based on edge-preserving filters
To break the limitation of camera imaging and acquire abundant information with multi-focus images, we present a novel multi-focus image fusion method based on edge-preserving filters. In this paper, the focusing level is measured by two cost functions and the focus map is constructed based on the winner- take-all manner. Besides, we demonstrate that the guided image filter (GIF) and the fast global smoother (FGS) have different advantages in image structure transferring and image smoothing, which can be utilized to construct the precise fusion weight maps. Combing with the weight maps acquired by GIF and FGS, the accurate all-in-focus image is obtained using a secondary fusion strategy. Experimental results show that the proposed method is competitive or even outperforms many state-of-the-art methods, which includes the recent CNN-based fusion method, while the proposed method is less time-consuming
Low-Complexity Deep HDR Fusion and Tone Mapping for Urban Traffic Scenes
In this paper we propose a computationally efficient neural network for high dynamic range fusion and tone mapping, for application in perception systems of autonomous vehicles. The proposed approach fuses two consecutive, differently exposed images into a single output with good exposure in all regions, in a standard dynamic range. Motion is compensated based on fast optical flow estimation, and subsequently by including an error mask as an input to the network to indicate the remaining
artifact-prone regions. This is an efficient way for the network to learn to reduce the ghosting artifacts without increasing computational complexity. Unlike the conventional approach, we train the network on versatile traffic data, and evaluate the performance based on object detection quality metrics, rather than for visual quality. The performance was compared to a similarly complex representative method from literature. We achieved improved performance in challenging light conditions due to the robustness of our method in variable traffic conditions
Automatic labeling of vulnerable road users in multi-sensor data
A growing interest in technologies for autonomous
driving emphasizes the demand for safe and reliable perception
systems in various driving conditions. The current state-of-the-
art perception solutions rely on data-driven machine learning
approaches, and require large amounts of annotated data
to train accurate models. In this study we have identified
limitations in the existing radar-based traffic datasets, and
propose a richer, annotated raw radar dataset. The proposed
solution is a semi-automatic data labeling tool, which generates
an initial set of candidate annotations using state-of-the-art
automatic object recognition algorithms, and requires only
minimal manual intervention. In the first qualitative evaluation
ever for automotive radar datasets we measure the quality of
automatically computed labels under various light conditions,
occlusion, behavior and modeling bias based on a multitude
of tracking metrics. We determined the specific cases where
automatic labeling is sufficient and where a human annotator
needs to inspect and manually correct errors made by the
algorithms